A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction† ELLEN RILOFF and JESS ICA SHEPHERD
نویسنده
چکیده
Many applications need a lexicon that represents semantic information but acquiring lexical information is time consuming. We present a corpus-based bootstrapping algorithm that assists users in creating domain-specific semantic lexicons quickly. Our algorithm uses a representative text corpus for the domain and a small set of ‘seed words’ that belong to a semantic class of interest. The algorithm hypothesizes new words that are also likely to belong to the semantic class because they occur in the same contexts as the seed words. The best hypotheses are added to the seed word list dynamically, and the process iterates in a bootstrapping fashion. When the bootstrapping process halts, a ranked list of hypothesized category words is presented to a user for review. We used this algorithm to generate a semantic lexicon for eleven semantic classes associated with the MUC-4 terrorism domain.
منابع مشابه
A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction
Many applications need a lexicon that represents semantic information but acquiring lexical information is time consuming. We present a corpus-based bootstrapping algorithm that assists users in creating domain-speciic semantic lexicons quickly. Our algorithm uses a representative text corpus for the domain and a small set of \seed words" that belong to a semantic class of interest. The algorit...
متن کاملCorpus-based Semantic Lexicon Induction with Web-based Corroboration
Various techniques have been developed to automatically induce semantic dictionaries from text corpora and from the Web. Our research combines corpus-based semantic lexicon induction with statistics acquired from the Web to improve the accuracy of automatically acquired domain-specific dictionaries. We use a weakly supervised bootstrapping algorithm to induce a semantic lexicon from a text corp...
متن کاملLearning Dictionaries for Information Extraction by Multi-Level Bootstrapping
Information extraction systems usually require two dictionaries: a semantic lexicon and a dictionary of extraction patterns for the domain. We present a multilevel bootstrapping algorithm that generates both the semantic lexicon and extraction patterns simultaneously. As input, our technique requires only unannotated training texts and a handful of seed words for a category. We use a mutual boo...
متن کاملA Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts
This paper describes a bootstrapping algorithm called Basilisk that learns highquality semantic lexicons for multiple categories. Basilisk begins with an unannotated corpus and seed words for each semantic category, which are then bootstrapped to learn new words for each category. Basilisk hypothesizes the semantic class of a word based on collective information over a large body of extraction ...
متن کاملExploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons
We present a bootstrapping method that uses strong syntactic heuristics to learn semantic lexicons. The three sources of information are appositives, compound nouns, and ISA clauses. We apply heuristics to these syntactic structures, embed them in a bootstrapping architecture, and combine them with co-training. Results on WSJ articles and a pharmaceutical corpus show that this method obtains hi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012